Three Way Search Engine Queries with Multi-feature Document Comparison for Plagiarism Detection

نویسندگان

  • Simon Suchomel
  • Jan Kasprzak
  • Michal Brandejs
چکیده

In this paper, we describe our approach at the PAN 2012 plagiarism detection competition. Our candidate retrieval system is based on extraction of three different types of web queries with narrowing their execution by skipping certain passages of an input document. Our detailed comparison system detects common features of input document pair, computing valid intervals from them, and then merging some detections in the postprocessing phase. We also discuss the relevance of current PAN 2012 settings to the real-world plagiarism detection systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

External Plagiarism Detection based on Human Behaviors in Producing Paraphrases of Sentences in English and Persian Languages

With the advent of the internet and easy access to digital libraries, plagiarism has become a major issue. Applying search engines is one of the plagiarism detection techniques that converts plagiarism patterns to search queries. Generating suitable queries is the heart of this technique and existing methods suffer from lack of producing accurate queries, Precision and Speed of retrieved result...

متن کامل

Efficient Technique to Retrieve Plagiarized Documents for Plagiarism Detection

This paper details the approach of implementing an English plagiarism source retrieval system. A given document is broke down into segments by using TextTiling algorithm. These segments , are centered around certain topics within the document, key phrases are generated using KPMiner keyphrase extraction system. Segments and key phrases are used to create queries of the segment and document. Cha...

متن کامل

Diverse Queries and Feature Type Selection for Plagiarism Discovery Notebook for PAN at CLEF 2013

This paper describes approaches used for the Plagiarism Detection task in PAN 2013 international competition on uncovering plagiarism, authorship, and social software misuse. We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance. The results show, that presented approach is adaptable in real-world plagiarism situations. For the ...

متن کامل

Educated Guesses and Equality Judgments: Using Search Engines and Pairwise Match for External Plagiarism Detection

This paper describes the approaches taken to the two subtasks of Candidate Document Retrieval and Detailed Comparison, in the Plagiarism Detection track at PAN 12. For the first of these, we describe how we used a combination of frequency and a variation of a contrastive corpus measure to select keywords with which to make queries to the ChatNoir search system; for the second, we provide an ove...

متن کامل

Unsupervised Ranking for Plagiarism Source Retrieval Notebook for PAN at CLEF 2013

The source retrieval task for plagiarism detection involves the use of a search engine to retrieve candidate sources of plagiarism for a suspicious document and provides a way to efficiently identify candidate documents so that more accurate comparisons can take place. We describe a strategy for source retrieval that makes use of an unsupervised ranking method to rank the results returned by a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012